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Abstract 

In this work, we investigate the problem of private statistical analysis in the dis¬ 
tributed and semi-honest setting. In particular, we study properties of Private 
Stream Aggregation schemes, first introduced by Shi et al. m- These are com¬ 
putationally secure protocols for the aggregation of data in a network and have a 
very small communication cost. We show that such schemes can be built upon any 
key-homomorphic weak pseudo-random function. Thus, in contrast to the afore¬ 
mentioned work, our security definition can be achieved in the standard model. In 
addition, we give a computationally efficient instantiation of this protocol based on 
the Decisional Difhe-Hellman problem. Moreover, we show that every mechanism 
which preserves (e, ^-differential privacy provides computational (e, (J)-differential 
privacy when it is executed through a Private Stream Aggregation scheme. Finally, 
we introduce a novel perturbation mechanism based on the Skellam distribution 
that is suited for the distributed setting, and compare its performances with those 
of previous solutions. 


1 Introduction 

The framework of statistical disclosure control aims at providing strong privacy guaran¬ 
tees for the records stored in a database while enabling accurate statistical analyses to 
be performed. In recent years, differential privacy has become one of the most important 
paradigms for privacy-preserving statistical analyses. Generally, the notion of differential 
privacy is considered in the centralised setting where we assume the existence of a trusted 
curator la emu eh who collects data in the clear, perturbs it properly (e.g. by adding 
Laplace noise) and publishes it. In this way, the output statistics are not significantly 
influenced by the presence (resp. absence) of a particular record in the database. 

In this work, we study how to preserve differential privacy when we cannot rely on a 
trusted curator. In this so-called distributed setting , the users have to send their data to 
an untrusted aggregator. Preserving differential privacy and achieving high accuracy in 
the distributed setting is of course harder than in the centralised setting, since the users 
have to execute a perturbation mechanism on their own. In order to achieve the same 
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accuracy as provided by well-known techniques in the centralised setting, the work by 
Shi et al. m introduces the Private Stream Aggregation (PSA) scheme, a cryptographic 
protocol which enables each user to securely send encrypted time-series data to an ag¬ 
gregator. The aggregator is then able to decrypt the aggregate of all data in each time 
step, but cannot retrieve any further information about the individual data. Using such 
a protocol, the task of perturbation can be split among the users, such that differential 
privacy is preserved and high accuracy is guaranteed. For a survey of applications of 
this protocol, we refer to m 

In [27], a PSA scheme for sum queries is provided and some strong security guarantees 
under the Decisional Diffie-Hellman assumption are shown. However, this instantiation 
has some limitations. First, the security only holds in the random oracle model; second, 
its decryption algorithm requires the solution of the discrete logarithm in a given range, 
which can be very time-consuming if the number of users and the plaintext space are 
large. Moreover, since a PSA scheme provides computational security, the perturbation 
mechanism in use can only provide a computational version of differential privacy, a 
notion first introduced by Mironov et al. [22]. In WL however, a connection between 
the security of a PSA scheme and differential privacy is not explicitly shown. In a sub¬ 
sequent work by Chan et al. [7], this connection is still not completely established, since 
the polynomial-time reduction between an attacker against a PSA scheme and a database 
distinguisher is missing. 

Objectives. In order to overcome the limitations of the construction in m , in this 
work we address the following problems. 

• We want to give sufficient conditions for a PSA scheme which has a security guar¬ 
antee in the standard model. 

• According to these conditions, we want to construct a concrete instantiation con¬ 
sisting of efficient algorithms, even when the number of users and the plaintext 
space become large. 

• We aim at showing that an information-theoretical differentially private mecha¬ 
nism preserves computational differential privacy when it is executed through a 
(computationally) secure PSA scheme. 

• We want to investigate differentially private mechanisms suitable for an execution 
through a PSA scheme. 

Contributions. We achieve the aforementioned goals in the following manner. In order 
to derive sufficient conditions for PSA schemes with a certain security guarantee, we 
lower the requirements of Aggregator Obliviousness from m by abrogating the attacker’s 
possibility to adaptively compromise users during the execution of a PSA scheme with 
time-series data. We show that a PSA scheme for achieving this lower security level 
can be built upon any key-homomorphic weak pseudo-random function. Since weak 
pseudo-randomness can be achieved in the standard model, this condition also enables 
secure schemes in the standard model. In particular, we can build a key-homomorphic 
weak pseudo-random function based on the Decisional Diffie-Hellman assumption in the 
group of quadratic residues modulo a squared safe prime. This function is used for 
the construction of a PSA scheme for sum queries. By comparing the running times 


and practical performances of our PSA scheme and the one given by Shi et al. 122] 
at the same security level, we find that our solution provides a significant speed-up for 
decryption when the plaintext space is large while decelerating the encryption only by a 
constant factor. 

Reduction-based security proofs for cryptographic schemes usually require an at¬ 
tacker in the corresponding security game to send two different plaintexts (or plaintext 
collections) to a challenger. The adversary receives then back a ciphertext which is the 
encryption of one of these collections and has to guess which one it is. In any security 
definition for a PSA scheme, these collections must satisfy a particular requirement (i.e. 
they must lead to the same aggregate), since the attacker has the capability to decrypt 
them (different aggregates would make the adversary’s task trivial). In general, however, 
this requirement cannot be satisfied in the context of differential privacy. Introducing a 
novel kind of security reduction which deploys a biased coin flip, we can show that, when¬ 
ever a randomised perturbation procedure is involved in a PSA scheme, the requirement 
of having collections with equal aggregate can be abolished. This result can be gener¬ 
alised to any cryptographic scheme with such a requirement. Using this property, we are 
able to show that if a mechanism preserves differential privacy, then it preserves compu¬ 
tational differential privacy when it is used as a randomised perturbation procedure in 
a PSA scheme. 

Finally, we compare three mechanisms: the Geometric mechanism from 127] . the Bi¬ 
nomial mechanism from [5] and the Skellam mechanism introduced in this work. All 
three mechanisms preserve differential privacy and make use of discrete probability dis¬ 
tributions. Therefore, they are well-suited for an execution through a PSA scheme. For 
generating the right amount of noise among all users, these mechanisms apply two differ¬ 
ent approaches. While in the Geometric mechanism, with high probability, only one user 
generates the noise necessary for differential privacy, the Binomial and Skellam mecha¬ 
nisms allow all users to generate noise of small variance, that sums up to the required 
value for privacy. We show that for high privacy levels, the theoretical error bound of the 
Skellam mechanism is slightly better than that of the other two. At the same time, we 
provide experimental results showing that the Geometric and Skellam mechanisms have 
a comparable accuracy in practice, while beating the one of the Binomial mechanism. 

Related Work. As pointed out above, our contributions are mostly related to the 
work of Shi et al. m and Chan et al. [7]. Privacy-preserving aggregation of time- 
series data in the presence of an untrusted aggregator has also been studied in various 
other works, e.g. [31 HI], IB 13 [H|. Beimel et al. [3. and Eigner et al. [H] show 
that secure multi-party computation techniques can be used for data aggregation under 
differential privacy. These techniques usually have a high communication cost, whereas 
PSA requires each user to send exactly one message per time-step. The protocol given by 
Rastogi et al. is based on the threshold Paillier cryptosystem. It requires an extra 
round of interaction between the users and the aggregator in every time-step in order 
to decrypt the sum queries. In contrast, PSA requires the users to interact with the 
aggregator only for sending the ciphertexts. Acs et al. [2 use an additive homomorphic 
encryption scheme for sending time-series data, but it requires the generation of a pair 
of encryption/decryption keys for each pair of users. Moreover, reuse of key pairs for 
different time-steps potentially leads to security breaches. In a PSA scheme each user 
gets only one encryption key, which can be securely used for every time-step. Using 


additive homomorphic encryption, Rieffel et al. [25] construct a scheme which does not 
require extra rounds of interaction, but is not fully resistant against collusions and the 
cost of computation and storage is roughly equal to the number of compromised users 
that is tolerable by the system. Li et al. mm use the homomorphic encryption scheme 
given by Castelluccia et al. m in order to construct an efficient protocol for sending 
data in mobile sensing applications. This scheme is resistant against collusions, but each 
user has to store multiple keys, depending on the number of compromised users in the 
network. Moreover, for encryption and decryption the scheme requires the computation 
of as many pseudo-random values as the number of keys in the network, making the 
computational effort for the analyst rather high. Thus, the costs of this scheme depend 
on the number of compromised users. A PSA scheme is fully resistant against any number 
of collusions and furthermore, we provide a solution, where the computation and storage 
costs are independent of the number of users. Joye et al. m provide a protocol with 
the same security guarantees as in m in the random oracle model. The security of their 
scheme relies on the DCR assumption (rather than DDH as in [27]) and as a result, in 
the security reduction they can remove a factor which is cubic in the number of users. 
However, their scheme involves a trusted party for setting some public parameters. In 
this work we provide an instantiation of our generic PSA construction, which is similar 
to the one in m but relies on the DDH assumption. While in our generic security 
reduction we cannot avoid the cubic factor in the number of users, our construction does 
not involve any trusted party and has security guarantees in the standard model. 

Another series of works deals with a distributed generation of noise for preserving 
differential privacy. Dwork et al. [5] consider the Gaussian distribution for splitting 
the task of noise generation among all users. Their proposed scheme requires more 
interactions between the users than our solution. In [2], privacy-preserving data aggre¬ 
gation is applied to smart metering and the generation of Laplace noise is performed in 
a distributed manner, since each meter simply generates the difference of two Gamma 
distributed random variables as a share of a Laplace distributed random variable. In 
[2'5] each user generates a share of Laplace noise by generating a vector of four Gaussian 
random variables. For a survey of the mechanisms given in [25] and [2], we refer to m- 
However, the aforementioned mechanisms generate noise drawn according to continuous 
distributions, but for the use in a PSA scheme, discrete noise is required. Therefore, 
we consider proper discrete distributions and compare their performances for private 
statistical analyses. 

2 Preliminaries 

2.1 Problem statement 

In this work, we consider a distributed and semi-honest setting where n users are asked 
to participate in some statistical analyses but do not trust the data analyst (or aggre¬ 
gator), who is assumed to be honest but curious. Therefore, the users cannot provide 
their own data in the clear. Moreover, they communicate solely and independently with 
the untrusted aggregator, who wants to analyse the users data by means of queries in 
time-series and aims at obtaining answers as accurate as possible. More specifically, 
assume that the data items belong to a data universe V. For a sequence of time-steps 


t e T, where T is a discrete time period, the analyst sends queries which are answered 
by the users in a distributed manner. Each query is modeled as a function / : T> n —» O 
for a finite or countably infinite set of possible outputs (i.e. answers to the query) O. 
We also assume that some users may act in order to compromise the privacy of the 
other participants. More precisely, we assume the existence of a publicly known con¬ 
stant 7 e ( 0 , 1 ] which is the a priori estimate of the lower bound on the fraction of 
non-compromised users who honestly follow the protocol and want to release useful 
information about their data (with respect to a particular query /), while preserving 
(e, ^-differential privacy. The remaining (1 — 7 )-fraction of users is assumed to be com¬ 
promised. Compromised users honestly follow the protocol but are aimed at violating 
the privacy of non-compromised users. For that purpose, these users form a coalition 
with the analyst and send her auxiliary information, e.g. their own data in the clear. 
For computing the answers to the aggregator’s queries, a special cryptographic proto¬ 
col, called Private Stream Aggregation (first introduced in [27]), is used by all users. 
In connection with a perturbation mechanism, this scheme assures that the analyst is 
only able to learn a noisy aggregate of the users’ data (as close as possible to the real 
answer f(D)) and nothing else. In contrast to common secure multi-party techniques 
QJH3H0], this protocol requires each user to send to the analyst only one message per 
query. 

2.2 Definitions 

We consider a database as an element D e V n , where T> is the data universe and n 
is the number of users. Since D may contain sensitive information, the users want to 
protect their privacy. Therefore, a privacy-preserving mechanism must be applied. Un¬ 
less stated differently, we always assume that a mechanism is applied in the distributed 
setting. Differential privacy m is a well-established notion for privacy-preserving sta¬ 
tistical analyses. We recall that a randomised mechanism preserves differential privacy if 
its application on two adjacent databases, i.e. databases which differ in one entry only, 
leads to close distributions of the output. 

Definition 1 (Differential Privacy [TO]). Let 7Z be a (possibly infinite) set and let n 6 N. 
A randomised mechanism A : V n —» 1Z preserves (e,S)-differential privacy (short: DPT 
if for all adjacent databases Dq,Di g T> n and all R c 7 Z: 

Pr[A(D 0 ) G R] < e £ ■ Pr[A(L>i) e R] + S. 

The probability space is defined over the randomness of A. 

The additional parameter S is necessary for mechanisms which cannot preserve e-DP 
(i.e. (e, 0)-DP) for certain cases. However, if the probability that these cases occur is 
bounded by S, then the mechanism preserves (e, <5)-DP. 

In the literature, there are well-established mechanisms for preserving differential 
privacy, e.g. the Laplace mechanism m and the Exponential mechanism m- In order 
to privately evaluate a query, these mechanisms draw noisy values according to some 
distribution depending on the query’s global sensitivity. 

Definition 2 (Global Sensitivity [10! )• T1ie global sensitivity S(f) of a query 


/ : 23" —» R fc is defined as 


S{f) 


max 

Dq,D\ adjacent 


Wf(Do) - /(-Di)||i- 


In particular, we will consider sum queries f-p '■ 23" —» Z defined as fp(D) := Yjj=\ di, 
for D = (di,..., d„) G 23" and 23 c Z. 

For measuring how well the output of a mechanism estimates the real data with respect 
to a particular query, we use the notion of (a, /3)-accuracy. 

Definition 3 (Accuracy |29]). The output of a mechanism A achieves (a, (3)-accuracy 
for a query f : 23" —> R if for all D e 23"; 


Pr[|A(23)-/(D)| <a] >1-/3. 


TVie probability space is defined over the randomness of A. 

The use of a cryptographic protocol for transferring data provides a computational 
security level. If such a protocol is applied for preserving differential privacy, this implies 
that only a computational level of differential privacy can be provided. Our definition of 
computational differential privacy follows the notion of Chan et al. (7]. 

Definition 4 (Computational Differential Privacy [7]). Let k be a security parameter 
and weM with n = poly{n). A randomised mechanism A ; 23" —> TZ preserves computa¬ 
tional (e, 6)-differential privacy (short: CDPJ, if for all adjacent databases Dq,D\ g 23" 
and all probabilistic polynomial-time distinguishers 23 CDP ; 


Pr[23 CDP (l K , A(D 0 )) = 1] < e e • Pr[23 CDP (l", A^)) = 1] + S + neg( K ), 

where neg(ft) is a negligible function in k. The probability space is defined over the 
randomness of A and T>cdp- 

The notion of computational differential privacy is a natural computational- 
indistinguishability-extension of the information-theoretical definition. The advantage 
is that preserving differential privacy only against bounded attackers helps to substan¬ 
tially reduce the error of the answer provided by the mechanism. In Section [HI we 
investigate how to obtain a computationally secure protocol which allows the analyst to 
compute only the aggregate of all users’ data and no further information. The scheme 
for sum queries we are going to construct uses a special mapping into a group, which we 
define formally. 

Definition 5 (t’-isomorphic embedding). An injective mapping p : {— v,...,v} —+ V, 
where (V, o) is a group, is a v-isomorphic embedding if for all n G N and all finite 
sequences (ai)i= i „ of values in {— v,with cii| < v: 



p(ai) o .. .o <p(a n ). 


From this definition it is clear that a v-isomorphic embedding is also i/-isomorphic 
for every integer 0 < v' < v. In the analysis of the secure protocol, we furthermore make 
use of the following definition. 


Definition 6 (Weak PRF [H]). Let n be a security parameter. Let A,B,C be sets. A 
family of functions 

T = {F a | F a : B - C} aeA 

is called a weak pseudo-random function (PRF) family if for all probabilistic polynomial- 
time algorithms Dppp with oracle access to 0(-) (where O(-) e {F a (•), rand(-)} / ) on any 
polynomial number of uniformly chosen inputs, we have: 

I Pt [ v pr'f( k ) = !] - Pr PraF } ( K ) = !]K ne g(«0, 
where a Bp A and rand ep {/1 / : B —► C} is a random mapping from B to C. 

2.3 Mechanism overview 

In this work we prove the following result by showing the connection between a key- 
homomorphic weak pseudo-random function and a differentially private mechanism for 
sum queries. 

Theorem 1. Let e > 0, w < w' e Z, m,n e N with max{|u>|, |«/|} < m. Let 
V = {w,... , w'} and fp be a sum query. If there exist groups G' <= G, a key-homomorphic 
weak pseudo-random function family mapping into G' and an efficiently computable and 
efficiently invertible mn- isomorphic embedding 

g) : {—mn,..., mn} —> G, 

then there exists an efficient mechanism for fp that preserves (e, (5)-CDP for any constant 
0 < 5 < 1 with an error bound of 0(S(fp)/e) and requires each user to send exactly one 
message. 

As already described, we want the untrusted analyst to be able to learn some aggre¬ 
gated statistics fp(D) but no additional information about each user’s data. Assume 
that we can design a cryptographic protocol that achieves the aforementioned goal. If 
we furthermore aim at preserving (e, c>)-DP, it would be sufficient to add a single copy 
of (properly distributed) noise Y to the value fp(D). Since we cannot add such noise 
once the aggregate has been computed, the users have to generate and add noise to their 
original data in such a way that the sum of the perturbations has the same distribution 
as Y. For this purpose, we see two different approaches. In the first approach, there 
is a small probability (depending on the fixed parameter 7 ) for each user to add noise 
sufficient to preserve the privacy of the entire statistics. This probability is calibrated 
in such a way only one of the n users is expected to add noise at all. Shi et al. m 
investigate this method using the Geometric mechanism. In the second approach, each 
user generates noise of small variance (again depending on 7 ), such that the sum of all 
noisy terms has enough variance for preserving differential privacy. For this aim, we need 
discrete probability distributions which are closed under convolution and are known to 
provide differential privacy. The Binomial mechanism [9] and the Skellam mechanism 
introduced in this work serve these purposes. In both approaches, the error which is in¬ 
troduced is reasonably small and similar theoretical bounds can be provided. For details, 
see Section [5] 


For a particular time-step, let the users’ values be of the form x i = di+ri, i = 1,..., n, 
where di e T> is the original data of the user i and n is her noisy value. In the privacy 
analysis, it is reasonable to assume that jy = 0 for the (1 — 7 )-n compromised users, since 
this can only increase their chances to infer some information about the non-compromised 
users. In order to send the values to the data analyst, the users perform a PSA scheme. 
First, each user encrypts her own time-series data and sends the ciphertexts to the data 
analyst. After a distributed key exchange, the evaluation of a single query (i.e. a query 
analysed in one time-step) requires each user to send exactly one message. The data 
analyst appropriately aggregates the ciphertexts of all users for a particular time-step 
and then decrypts the sum of the users’ values Yji=i x i- From the ciphertexts, the data 
analyst is not able to leach any additional information about the values of the users, 
except for the auxiliary information obtained from the compromised users. In this way, 
there is no privacy-breach if only one user adds the entirely needed noise (first approach) 
or if the non-compromised users generate noise of low variance (second approach), since 
the single values are encrypted and the analyst cannot learn anything about them, ex¬ 
cept for their aggregate. Due to the use of a cryptographic protocol, the plaintexts have 
to be discrete. This is the reason why we use discrete distributions for generating the 
noisy values ry. 

The perturbation of data potentially yields larger values Xi due to the (possibly) infi¬ 
nite domain of the underlying probability distribution. Depending on the variance, we 
therefore need to choose a sufficiently large interval T> = {—m,..., m} as plaintext space, 
where m > max{|w|, |u/|} such that \xi\ ^ m for all i = 1 ,n with high probability. 
In the following, we always assume that V is a subinterval of V. 

Since the protocol used for the data transmission is computationally secure, the entire 
mechanism preserves (e,<5)-CDP. 


3 Private Stream Aggregation 

In this section, we define the Private Stream Aggregation scheme and give a security 
definition for it. Thereby, we mostly follow the concepts introduced by Shi et al. m, 
though we deviate in a few points. Afterwards, we give a condition for the existence of 
secure PSA schemes. Moreover, we give a concrete and efficient instantiation of a secure 
PSA scheme in the standard model. 

3.1 The definition of Private Stream Aggregation and its secu¬ 
rity 

Private Stream Aggregation. A PSA scheme is a protocol for safe distributed time- 
series data transfer which enables the receiver to learn only the aggregate f(D) of a 
query / : V n —> O over some distributed (and possibly perturbed) database D e V n . 
Such a scheme needs a key exchange protocol for all n users together with the analyst 
as a precomputation, and requires each user to send exactly one message per query. For 
the definition of PSA, we follow m 

Definition 7 (Private Stream Aggregation). Let k be a security parameter and n e N 
with n = poly(n). A Private Stream Aggregation scheme E = (Setup, PSAEnc, PSADec) 
is defined by three probabilistic polynomial-time Algorithms: 


Setup; (pp, T, s, S \,..., s n ) <— Setup(l K ), where pp are public parameters of the system, 
T is a set of time-steps and s, s \,..., s n are private keys. 

PSAEnc: For time-step t e T and all i = 1,..., n: 

Ci t t <— PSAEnc Si (f, Xi) for a data value Xi e V. 

PSADec: For time-step t e T, ciphertexts cqt,... ,c n ,t e C, where C is the range of 
PSAEnc, and a query f : T> n —> O compute 

f{x' 1 ,...,x' n ) = PSADec s (t,ci it ,... 

For all t e T and x \,..., x n e T>: 


PSADec s (f, PSAEnc sl (i, aq), • ■ ■, PSAEnc Sn (f, x n )) 

=f(x i, • ■ -,x n ). 

The Setup-phase has to be carried out just once and for all, and can be performed 
with a secure multi-party protocol among all users and the analyst. In all other phases, 
no communication between the users is needed. 

The system parameters pp are public and constant for all time-steps with the implicit 
understanding that they are used in E. Every user encrypts her data Xi with her own 
secret key s, and sends the ciphertext to the analyst. If the analyst receives the cipher- 
texts of all users for a time-step t , it can compute the aggregate, i.e. the evaluation of 
the query /, of the users’ data with the decryption key s. 

Security. Since our model allows the analyst to compromise users, the aggregator can 
obtain auxiliary information about the data of the compromised users or their secret 
keys. Even then a secure PSA scheme should release no more information than the ag¬ 
gregate of the non-compromised users’ data in a single time-step. 

Informally, a PSA scheme E is secure if every probabilistic polynomial-time algorithm, 
with knowledge of the analyst’s and compromised users’ keys and with adaptive encryp¬ 
tion queries, has only negligible advantage in distinguishing between the encryptions 
of two databases D 0 ,D i of its choice, where /(Do) = f(D i). We can assume that an 
adversary knows the secret keys of the entire compromised coalition. If the protocol is 
secure against such an attacker, then it is also secure against an attacker without the 
knowledge of every key from the coalition. Thus, in our security definition we consider 
the most powerful adversary. In what follows, let f\ x : T >—> O denote the evaluation 
of a query / : V n —> O with input D e V n over a subset X c [?i] of users. The security 
definition is similar to the one in m- 

Definition 8 (Security of PSA). Let T be a probabilistic polynomial-time adversary for 
a PSA scheme E = (Setup, PSAEnc, PSADec) and let f : T> n —* O be a statistical query 
over the set V. Let T be the set of time-steps for possible data analyses. We define the 
following security game between a challenger and the adversary T. 

Setup. The challenger runs the Setup algorithm on input security parameter u and 
returns public parameters pp, time-steps T with |Tj = poly(u) and secret keys 
s, si,..., s n . It sends k. pp, T, s to T. 


Queries. The challenger flips a random bit b G# {0,1}. T chooses U c [n] and sends 
it to the challenger which returns (si)ie[ n ]\j/- T is allowed to query ( i,t,Xi ) with 
i e Uff G T,Xi G T> and the challenger returns 

PSAEnc Si (t , xf}. 

Challenge. T chooses t* G T such that no encryption query at t* was made. (If there 
is no such t* then the challenger simply aborts.) T queries two different tuples 
(z[° ] W, (x\ 1] )ieu with 

/fo((A'° ] W) = /iu(( x i 1] W)- 
For all i G U the challenger returns 

c M * PSAEnc Si 

Queries. T is allowed to make the same type of queries as before with the restriction 
that no encryption query at t* can be made. 

Guess. T outputs a guess about b. 

The adversary wins the game if it correctly guesses b. A PSA scheme is secure if no prob¬ 
abilistic polynomial-time adversary T has more than negligible advantage (with respect 
to the parameter k) in winning the above game. 

Encryption queries are made only for i G U, since knowing the secret key for all 
i G [n]\t/ the adversary can encrypt a value autonomously. If encryption queries in time- 
step t* were allowed, then no deterministic scheme would be secure. The adversary T can 
determine the original data of all i G [n]\[/ for every time-step, since it knows [n.]\t/ ■ 

Then T can compute the aggregate of the non-compromised users’ data. For example, 
when f = ff, is a sum query we have f\ v {D) = PSADec s (f, c M ,..., c„ jt ) - f\ ln]yu ( D ), 

where D = (aq,..., x n ) and is the encryption of aq for all i G [n]. On the other 
hand, if there is a user’s ciphertext which 7” does not receive, then it cannot obtain the 
aggregate for the correspondent time-step. 

The security definition indicates that T cannot distinguish between the encryptions of 
two different data collections (x[°^)i £ c/, (x^)i € u with the same aggregate at time-step t*. 
For proving that a secure PSA scheme can be used for computing differentially private 
statistics with small error, we have to slightly modify the security game such that an 
adversary may choose adjacent (and non-perturbed) databases, as it is required in the 
definition of differential privacy. For details, see Section [I] 

Definition [5] differs from the definition of Aggregator Obliviousness [27] since we require 
the adversary to specify the set U of non-compromised users before making any query, 
i.e. we do not allow the adversary to determine U adaptively. 

3.2 Feasibility of efficient and secure Private Stream Aggrega¬ 
tion 

Using a secure PSA scheme, we ensure that the transmitted data of non-compromised 
users do not disclose sensitive information other than their aggregate. We now state the 
condition for the existence of secure (as in Definition [SJ PSA schemes for sum queries. 


Theorem 2. Let k be a security parameter, and m,n e N with log(m) = poly(n),n = 
poly(n). Let ( G , •), ( S , *) be finite groups and G' c G. For some finite set M, let 

F = {F s | F s : M —> G'} s es 

be a family of functions which are homomorphic over S and 

tp : {—mn ,..., mn} —> G 

an mn-isomorphic embedding. If F is a weak PRF family, then the following PSA scheme 
£ = (Setup, PSAEnc, PSADec) is secure: 

Setup: (pp, T,s,si,..., s n ) <— Setup(l K ), where pp are parameters of G , G', S , M, F, ip. 
The keys are Si S for all i e [n] with s = (^™ = i s*) -1 and T cz M such that all 
t e T are chosen uniformly at random from M. 

PSAEnc: Compute Cij = F Si (t) • ip{xf) in G, where Xi £ V = {—m,..., m}, t e T. 

PSADec: Compute <p(2iLi x i) = F s (t) • Ci t t ■ ... ■ c n< t and invert. 

Thus, we need a key-homomorphic weak PRF and a mapping which homomorphically 
aggregates all users’ data. Since every data value is at most to, the scheme correctly 
retrieves the aggregate, which is at most to • n, by the mn-isomorphic property of ip. Im¬ 
portantly, the product of all pseudo-random values F s (f), F Sl (t ),..., F Sjl (t) is the neutral 
element in the group G for all t e T. Since the values in T are uniformly distributed in 
M, it is enough to require that T is a weak PRF family. Thus, the statement of this 
theorem does not require a random oracle. 

The proof of Theorem [2] works with a sequence of games and builds on the ideas of [27]. 
The details of the proof are given in Appendix[A] Here we just give the main ideas. In the 
first step (Lemma 0 we construct a real-or-random version of the PSA security game, 
where encryption is performed equally likely with a weak PRF or a random function. 
We show that winning the PSA security game is at least as hard as winning its real- 
or-random version. The second step (Lemma ll(JI) shows that the plaintext dependence 
of the ciphertexts generated in the game can be abolished. Since we are dealing with 
a non-adaptive security definition there is no need of simulating the random choice of 
time-steps by programming a random oracle as it is required in the proof of Aggregator 
Obliviousness by Shi et al. m Therefore, in contrast to Em our result does not rely on 
a random oracle and the full proof works in the standard model. In the last step, (Lemma 
E]) using the hybrid argument, we show that winning the game is at least as hard as 
distinguishing the weak PRF from a random function. Here the adversary’s specification 
of U before making the first query allows the PRF distinguisher to be consistent with 
real random values or pseudo-random values in its replies to the queries. All in all, we 
get that winning the first game with our construction is at least as hard as distinguishing 
the weak PRF from a random function, completing the proof of Theorem [2] 

3.3 An efficient Private Stream Aggregation scheme 

We give an instantiation of a secure PSA scheme consisting of efficient algorithms. Its 
security is based on the Decisional Diffie-Hellman (DDH) problem. 


Example 1. Let q > m ■ n and p = 2 • q + 1 be large primes. Let furthermore G = 
2*2,1? = Z pq ,M = G' = QIZpi and g e Z* 2 with ord(g) = pq. Then g generates the 
group M = G' = Q1Z p 2 of quadratic residues modulo p 1 2 . In this group DDH is assumed 
to be hard. Then we define 

• {pp} = (g,p). Choose keys si,...,s n h pq and s = — 2j"=i s * mod pq. Let 
T a M, i.e. t is a power of g for every t e T. 

• F Si ( t ) = t Si mod p> 2 . This is a weak PRF under the DDH assumption, as can be 
proven using arguments similar to the ones in f“2S\ . 

• ip(xi) = 1 + p • Xi mod p 2 , where —m < Xi < m. (It is easy to see that p is an 
mn-isomorphic embedding.) 

For aggregation, we compute X e {1 — p ■ mn,... ,1 + p ■ mn} with 

n n 

X =F s (t) ■ ]^[ F Si (f) • p(xi) = ]^[(1 +p ■ 

i=1 2=1 

n n 

= 1 +p-YjXi+p 2 ■ 2 XiXj + ... + p n 'Y[xi 

i=l i,je[n],j,ti i= 1 

n 

=1 + p • '^Xi mod p 2 

i= 1 

and decrypt Ym=i Xi = p(^ — 1) over the integers. 

The difference to the scheme introduced in m lies on the map (p. Whereas the PRF 
in m works similarly (the underlying group G is Z* rather than Z* 2 ), the aggregational 
function is defined by 

p{xi) = g Xi mod p , 

which requires to solve the discrete logarithm modulo p for decrypting. In contrast, our 
efficient construction only requires a subtraction and a division over the integers. 

Remark 1. In the random oracle model, the construction shown in Example]]) achieves 
the stronger notion of Aggregator Obliviousness, which is the adaptive version of our 
security definition (for details, see the proof in Appendix A of [2 7j ). The same proof can 
be applied to our instantiation by simply replacing the map p involved and using a strong 
version of the PRF F0 

Note that for a given p, the running time of the decryption in our scheme does not 
depend on m , so it provides a small running time even if m is exponentially large. At 
the same time, the decryption of the scheme in m can also be performed efficiently, 
even if m is superpolynomial in the security parameter: discretise the plaintext space 
into ■sjn equidistant values and let each user choose the value nearest to her original 

1 For showing Aggregator Obliviousness we would have to substitute the choice of random values t 
by a hash function H : M —> QlZ p 2 modeled as a random oracle for some domain M. Therefore the 

PRF would be F Si (t) = H(t) Si mod p 2 which is the strong version of the weak PRF in Example [l] In 
this case, all t may be chosen in a deterministic way. 



Length of p 

1024-bit 

2048-bit 

4096-bit 

m 

1.1 ms 

7.5 ms 

57.0 ms 

This work 

3.9 ms 

29.4 ms 

225.0 ms 


Table 1: Time for encryption 


m 

10° 

10 1 

10 2 

10 3 

10 4 

[27 ] b.-f. 

0.04 s 

0.24 s 

2.67 s 

28.97 s 

381.05 s 

This work 

0.09 s 

0.08 s 

0.08 s 

0.09 s 

0.08 s 


Table 2: Time for decryption (2048 bit, n = 1000) 


one as input for encryption. The aggregated value has the correct expectation, but the 
decryption algorithm has to search only in a range of n 1,5 values. However, this method 
causes a superlinear time-dependence on n for the decryption and induces an additional 
aggregation error due to discretisation. 

We compare the practical running times for encryption and decryption of the scheme in 
[27] with the algorithms of our scheme in Table [T] and Table [2] respectively. Here, let 
m denote the size of the plaintext space. Encryption is compared at different security 
levels with m = 1. For comparing the decryption time, we fix the security level and the 
number of users and let m be variable. 

All algorithms were executed on an Intel Core i5, 64-bit CPU at 2.67 GHz. We compared 
the schemes at the same security level, assuming that the DDH problem modulo p is as 
hard as modulo p 2 , i.e. we used the same value for p in both schemes. For different 
bit-lengths of p, we observe that the encryption of our scheme is roughly 4 times slower 
than the encryption in m- The running time of our decryption algorithm is widely 
dominated by the aggregation phase. Therefore it is clear, that it linearly depends on 
n. Using a 2048-bit prime and fixing n = 1000, the running time of the decryption in 
our scheme is less than 0.1 second for varying values of m. In contrast, the time for the 
brute-force decryption in m grows roughly linearly in m. 

As observed in [27], using Pollard’s lambda method would reduce the running time for 
decryption of m to about yjmn. Nevertheless, our scheme provides a speed-up of -y/jn/n 
whenever m is larger than n, while the encryption is decelerated only by a constant factor. 


4 Achieving Computational Differential Privacy 

Notation 1. Let n be a security parameter. If an expression to = uj(k) is non-negligible 
in k (i.e. if ui > 1 /poly{n)), then we write to > neg(«). 

In this section, we describe how to preserve computational differential privacy using 
a PSA scheme. As described above, in the work by Chan et al. [7] the polynomial-time 
reduction between an attacker against the security of a PSA scheme and an attacker 
against differential privacy is missing. In this section we provide an appropriate reduc¬ 
tion. The content of this section is independent of Theorem [2] Specifically, let A be a 
mechanism which, given some event Good, evaluates a statistical query / : T> n —> O over 
a database D e V n preserving e-DP. Furthermore, let S be a secure PSA scheme for /. 















We show that A executed through E preserves e-CDP given Good. Let Bad = Good and 
assume Pr[Pad] < 6. In Section [5j we will give instantiations of such a mechanism A 
and show that they preserve (e,d)-CDP unconditionally if executed through E. For sim¬ 
plicity, in this section we focus on sum queries, but our analysis can be easily extended 
to more general statistical queries. Our technique involves a reduction-based proof using 
a biased coin toss and is of independent interest. 

4.1 Redefining the security of Private Stream Aggregation 

Let us first modify the security game in Definition [5] in the following way. Let game 
1 be the original game from Definition [BJ Let p e (0,1) and P = max{p, 1 — p}. The 
P-game 1 for a probabilistic polynomial-time adversary 71 ,p is defined as game 1 with 
the following changes: 

• Before the challenge phase, 71 ,p sends p to the challenger. 

• In the challenge phase, the challenger chooses b = 0 with probability p and 6 = 1 
with probability 1 — p. 

We call a PSA scheme P-secure if the probability of every probabilistic polynomial-time 
adversary 71 ,p in winning the above game is P + neg(«;). Note that game 1 is a special 
case of P-game 1, where P = 1/2. We refer to this case as the unbiased version (rather 
the biased version if P > 1/2) of P-game 1. In the unbiased case, we just drop the 
dependence on P and the adversary is not required to send p to its challenger. 

4.2 Constructing a PSA adversary using a CDP adversary 

4.2.1 Security game for adjacent databases 

For showing that a P-secure PSA scheme is suitable for preserving CDP, we have to con¬ 
struct a successful adversary in P-game 1 (with a proper choice of p) using a successful 
distinguisher for adjacent databases. We define the following game 0 for a probabilistic 
polynomial-time adversary 7o which is identical to game 1 with a changed challenge- 
phase: 

Challenge. To chooses t* e T such that no encryption query at t* was made. 7o 
queries two adjacent tuples (d^)«=[/, (d\ ^ )ieC7- For all i e U the challenger returns 

c ijt * = PSAEnc Si (t*,xf ] ), 

where x[ b ^ e V is a noisy version of d e V for all * e U obtained by some 
randomised perturbation process. 

Now consider the following experiment which we call Exp^ Let fx> ■ T> n —* O be a sum 
query and Q : D n —» [0,1] a probability distribution function on D n . For simplicity we 
consider only the case where V <= V = Z. Let Do , D\ e T> n . Then Expj^ is performed as 
follows: 

• Let B 1 i 2 be a Bernoulli variable with Vr\B 1 /2 = 0] = 1/2. 



• Let X\ be a random vector with probability distribution function Q[x~ Df) 1 where 
b is a realisation of B 1 / 2 . 

• Let Y = /g(Xi). 

We now define an experiment Exp 2 and afterwards show that it is statistically equivalent 
to Expx. 

• Let Y be a random variable as in Expx- 

• Let p = Vt:\B 1 /2 = 0|Y = y\. Let B p be a Bernoulli variable with Pt[B p = 0] = p. 

• Let X 2 be a random vector with conditional probability function 


Pr[X 2 = D\B p = b,Y = y] 


f x(y)-Q(D-Do) 

] 2pPr[Y=y] > 

| x(y)-Q(D-Di) 

[ 2 (l-p) PrLl'=yJ ’ 


if b = 0 
if b = 1. 


Here b is a realisation of B p and x(y) = X{ 2e e>|, = /-q£>)}(y) denotes the characteristic 

function of {z e 0\z = fjj(D)} cz O , which is 1 if y = f-g(D) and 0 otherwise. 
Note that the values x{y) , Q{x), p , Pr[F = y] for the computation of the conditional 
probability of X 2 are known. 

For showing that Exp x and Exp 2 are statistically equivalent it suffices to show that 
the joint distributions PrfHx/ 2 , Xi, Y] and Pr [B p ,X 2 ,Y] are equal. 

Lemma 3. Pr \B 1 / 2 ,Xi,Y\ = Pr[i? p ,X 2 , Y]. 

Proof. We observe that in Exp 1; Pr[Y = y\X\ = D,B 1 / 2 = 6] = Pr[Y = y\X\ = D], 
Therefore we have 


Pr[Xr = D\B 1/2 = b,Y = y\ 

Pr [Y = y\X i = D\ ■ Q(D - D b ) 

Pr[Y = y\B 1/2 = b] 

x{y) ■ Q{d - Db) 

Pr[Y = y\B 1/2 = b] ’ 

which exactly corresponds to the conditional probability of X 2 in Exp 2 . Thus, we have 

Pr [.Bp, X 2 , Y] 

= Pr[Y]-Pr[Hp]-Pr[X 2 |Hp,Y] 

= Pr[Y] • Pr[B 1/2 |Y] • Pr[X!|H 1/2 , Y] 

= Pr [B 1/2 ,X u Y\. 

□ 


Note that Lemma[3]also applies to the marginals of the triples (Bi/ 2: Xi, Y) and (B p , X 2 , Y). 






4.2.2 The Reduction 


With Lemma [3] in mind, we can show that a successful adversary in game 0 yields a 
successful adversary in P-game 1 for a particular P e [1/2,1). Afterwards we show that 
a successful adversary in P-game 1 for any P e [1/2,1) yields a successful adversary in 

game 1. 

Lemma 4. Let n be a security parameter. Let To be an adversary in game 0 with ad¬ 
vantage yo(n) > neg(p). LetBi/^ denote the random variable describing the challenge bit 
b in game 0 and let Y denote the random variable describing the aggregate of (xf^)i e u. 
Let p be the probability of B 1 / 2 = 0 given the choice of Y and let P = max{p, 1 — p}. 
Then there exists an adversary Ti,p in P -game 1 with advantage pi t p(n) > neg( k). 

Proof. We construct a successful adversary 7i,p in P-game 1 using To as follows: 

Setup. Receive k, pp, T, s from the P-game 1-challenger and send it to 70- 

Queries. Receive U = {i i,..., i u } <= [n] from 7o and send it to the challenger. Forward 
the obtained response to 7o- Forward 7o’s queries (i, t, di) with i e U,t e 

T,di eV to the challenger and forward the obtained response Cj )t to %■ 

Challenge. 7o chooses t* e T such that no encryption query at t* was made and queries 
two adjacent tuples (d^)i € u, Choose a realisation y of Y according 

to Exp 2 . Set p = Yy[Bi /2 = 0|Y = y] and choose {x^)iBU with probability 
Pr[X 2 = ( x[% eU |.Bi /2 = a, Y = y] for a = 0,1 according to Exp 2 . Send p,t*, 
(a:^, x^)feu to the challenger. Obtain the response ( c it *)i e jj and forward it to 
To. 

Queries. To can make the same type of queries as before with the restriction that no 
encryption query at t* can be made. 

Guess. % gives a guess about which database was encrypted. Output the same guess. 

The rules of P-game 1 are preserved since 71,p sends two tuples of the same aggregate 
y to its challenger. On the other hand, since the ciphertexts generated by the challenger 
are determined by the challenge bit and the collection (x the rules of game 0 are 
preserved by Lemma [3] (the triple (6, (a y) is chosen according to Exp 2 ). Therefore 
71,p perfectly simulates game 0 and has the same advantage as To- □ 

We now show that a secure PSA scheme is also P-secure for every p e (0, 1), where 
P = max{p, 1 — p}. 

Lemma 5. Let k be a security parameter. For any p e (0,1) let 7I,p be an adversary 
in P-game 1 with advantage p\^p(T) > neg(fc). Then there exists an adversary T\ in 
game 1 with advantage p\(k) > neg(«:). 

Proof. Given a successful adversary 7qp in P-game 1, we construct a successful adver¬ 
sary 71 in game 1 as follows: 

Setup. Receive k, pp, T, s from the game 1-challenger and send it to 71,p. 


Queries. Receive U <= [n] from 7i,p and send it to the challenger. Forward the 
obtained response (si)ie[nl\E/ to 7i,p. Forward 7i,p’s queries (■ i,t,Xi ) with i e 
[/, t e T, Xi e 2? to the challenger and forward the obtained response c*,* to 7i,p. 

Challenge. 71,p chooses t* e T such that no encryption query at f* was made, sends 
p e (0,1) and queries two different tuples with /g, ((a:^)i e c/-) 

Choose a bit a with Pr[a = 0J = p, Pr[a = 1] = 1 — p and query 
(x[ a ^)i € u, (xi)ieu to the challenger, where the x, are chosen uniformly at random 
from V such that /g, {(xi) ieU ) = f'D\ u (( x i^)ieu)- Obtain the response ( c i>t *) ieU 
and forward it to 7i,p. 

Queries. 7i,p can make the same type of queries as before with the restriction that 
no encryption query at t* can be made. 

Guess. 7I,p gives a guess about a. If the guess is correct, then output 0; if not, output 

1. 

If 7I,p has output the correct guess about a then 71 can say with high confidence that 
the challenge ciphertexts are the encryptions of (xY^)ieu and therefore outputs 0. On 
the other hand, if 7i,p’s guess was not correct, then 7i can say with high confidence 
that the challenge ciphertexts are the encryptions of the random collection (xi)i € u and 
it outputs 1. Formally: 

Case 1. Let (c iit *)j e p = (PSAEnc Si (x- a ^))i e p. Then 7i perfectly simulates P-game 1 
for 7i,p and the distribution of ciphertexts is the same as in P-game 1: 

Pr[7i outputs 0] 

=p ■ Pr[7i,p outputs 0 | a = 0] + (1 — p) ■ Pr[7i,p outputs 11 a = 1] 

= Pr[7i,p wins P-game 1] 

=p + 

Case 2. Let (c, t*)iEU = (PSAEnc ai (xi))ieu- Then the ciphertexts are random with the 
constraint that their product is the same as in the first case. The probability that 71,p 
wins game 1 is at most P and 

Pr[7i outputs 1] 

=p ■ Pr[7I,p outputs 11 a = 0] + (1 — p) ■ Pr[7i,p outputs 0 | a = 1] • (1 — p) 

= Pr[7I,p loses P-game 1] 

^1-P. 


Finally we obtain that the advantage of 71 in winning game 1 is 

Ai(«) S* ^Ai ,p(k) > neg(fc). 


□ 



4.3 Proof of Computational Differential Privacy 

We have shown that no probabilistic polynomial-time adversary can win game 0 if the 
underlying PSA scheme is secure. If the perturbation process in game 0 is e-DP-preserv- 
ing, then the whole construction provides e-CDP, as we show now. 

Theorem 6. Let A be a mechanism for a query fv '■ T >n —* O which preserves e-DP 
and let S be a secure PSA scheme for fv- Then A preserves e-CDP if it is used for the 
perturbation process in game 0 instantiated with £. 

Proof. Consider again game 1, P-game 1 and game 0. We first bound the probability 
p = Py[Bi/ 2 = 0|Y = y ] for the biased coin in P-game 1. Since the perturbation process 
was performed by A , the random variable Y corresponds to the output of A and we have 


e 6 • Pr[F = y\B 1/2 


0] < Pr[F = y\B 1/2 
<e £ ■ Pr[F = y\B 1/2 


1 ] 

0 ], 


By the Bayes-formula we get 


Pmin • 


1 

- < 

1 + e £ 


e e 

1 + e £ 


Pmax ■ 


Now let v be a probabilistic polynomial-time Turing machine. Let 7i,p denote this Tur¬ 
ing machine as adversary in the P-game 1 for any P = max{p, 1 — p] with 
p e \pmin,Pmax\ and let 71 denote the same Turing machine as adversary in game 


1. Let finally Pcdp = 7o denote the same 
a = 0,1: 

Pr[P C DP = a, B x / 2 = 1] = 

Yp max ‘ 
= Pmax 
V max 

= Pmax 
^Pmax 
Pmin 



machine as adversary in game 0. 

Then for 

Pi'[7I,p = a, B p = 1] 

Pr[71,p = a\B p = 1] 

(1) 

Pr[7i = a\B 1/2 = 1] 

Pr[7I = a|Pi/ 2 = 0] + neg(ec) 

(2) 

Pr[7I,p = a\B p = 0] + neg (k) 

Pr[7i,p = a, B p = 0] + neg(«;) 

(3) 

Pr[7I,p = a, B p = 0] + neg(«) 

Pr[P C DP = a, B l / 2 = 0] + neg(«). 

(4) 


Equations <m> and 0| hold because of Lemma Q] and Equations (© and 0|) hold because 
of Lemma El It follows that 


Pr[T>cDP = a|P 1/2 = 1] < e e • Pr[P C DP = a|^ 1/2 = 0] + neg(A). 


□ 


As mentioned at the beginning of this section, we are considering a mechanism which 
preserves e-DP given some event Good. Therefore, also Theorem [G] applies to this mech¬ 
anism given Good. Accordingly, the mechanism unconditionally preserves (e, <5)-CDP, 
where <5 is a bound on the probability that Good does not occur. 






s y 


Figure 1: Empirical error of the Geometric, Skellam and Binomial mechanisms. The 
fixed parameters are e = 0.1, S//) = 1,/J = 0.001. The left graph shows the mean of 
the error in absolute value for variable <5 and 7=1 over 100 runs, the right graph is for 
variable 7 and S = 0 . 001 . 


5 Mechanisms for Differential Privacy 

In this section we recall the Geometric mechanism from m and the Binomial mechanism 
from p5] and introduce the Skellam mechanism. Since these mechanisms make use of a 
discrete probability distribution, they are well-suited for an execution through a secure 
PSA scheme, thereby preserving computational differential privacy as shown in the last 
section. 

Theorem 7. Let e > 0. For all databases D e T> n the randomised mechanism 

A(D) := f(D) + Y 

preserves (e,5)-DP with respect to any query f with sensitivity S(f), ifY is distributed 
according to one of the following probability distributions: 

1. Y ~ Geom(A) with A = exp (e/S(f)) (and 6 = 0) [27j, 

2. Y ~ Bin(n', 1/2) with n' = 64 • S(f) 2 ■ log(2 /6)/e 2 [9], 

3. Y ~ Sk(/xj! with 

= _ log(l/^) _ 

M 1 - cosh(e/5(/)) + (e/5(/)) • sinh(e/S(/))' 

We provide the proof of the third claim in Appendix [Bj Executing these mechanisms 
through a PSA scheme requires the use of the known constant 7 which denotes the a 
priori estimate of the lower bound on the fraction of non-compromised users. For this 
case, we provide the accuracy bounds for the aforementioned mechanisms. 

2 Sk(/l) denotes the symmetric Skellam distribution with mean 0 and variance For details, see 
Appendix [2 



















Theorem 8 . Let e>0,0<<5<l, S(f) > 0 and let 0 < 7 < 1 be the a priori estimate of 
the lower bound on the fraction of non-compromised users in the network. By distributing 
the execution of a perturbation mechanism as described above and using the parameters 
from Theorem 0 we obtain (a, 0)-accuracy with the following parameters: 


i ■ log (|) • log for the Geometric mechanism, where S bounds 
the probability that no user has added noise m , 

2. a = 1 . i 0 g (2) . log for the Binomial mechanism, 

3. a = ' (7 ' (I) + l°g (f)) f or the Skellam mechanism. 

The second claim can be easily shown using a standard tail bound for the Binomial 
distribution. The proof of the third claim is provided in Appendix iBl 
Theorem [5] shows that for constant S, /?, 7 the errors of the three mechanisms are bounded 
by 0(S(f)/e) and therefore do not exceed known bounds in the centralised model. As 
pointed out in Section 12.31 the execution of the Geometric mechanism through a PSA 
scheme requires each user to generate full noise with a small probability. Complementary, 
the other two mechanisms allow all users to simply generate noise of small variance. 
While the accuracy bound of the Geometric is roughly a constant factor smaller than 
the bound of the Binomial, we obtain a better bound for this second approach using 
the Skellam mechanism. Specifically, the ratio between the factor log(2//?) + log(l/( 5)/7 
in the accuracy of the Skellam mechanism and the factor \/log(2//3) • log(l/< 5)/7 in the 
accuracy of the Geometric mechanism goes to 0 when S and /3 go to 0. For example, fix 
S(f) = 1,<5 = 0.01, a = 50, j3 = 0 . 1,7 = 1. Then the Geometric mechanism preserves 
(e, <5)-CDP with e 0.30, while the Skellam mechanism preserves (e, (5)-CDP with e as 
0.15. An empirical accuracy comparison between the mechanisms is shown in Figure 
□ We observe that the error of the Geometric and the Skellam mechanisms have a 
similar behaviour for both variables 5 and 7 , while the error of the Binomial mechanism 
is roughly three times larger. Finally, we are able to prove our main result, Theorem [I] 
which follows from the preceding analyses. 

Proof of Theorem [0 The claim follows from Theorem [G] together with Theorem [2] (in¬ 
stantiated with the efficient construction in Example [1|) and from Theorem [7] together 
with Theorem [HI □ 

6 Conclusions 

In this work we continued a line of research opened by the work of Shi et al. m ■ By 
lowering the security definition of a PSA scheme, we were able to prove that a secure 
scheme (in this sense) can be built upon key-homomorphic weak PRFs. Based on the 
DDH assumption, we gave an instantiation of a secure PSA scheme. If the plaintext 
space is large enough, it has a substantially more efficient decryption algorithm than the 
scheme in m at the cost of a slightly less efficient encryption algorithm, and achieves 
non-adaptive security in the standard model. Using the notion of computational differ¬ 
ential privacy, we provided a connection between a secure PSA scheme and a mechanism 


1 . 


a = ±sm 







preserving differential privacy by showing that a differentially private mechanism pre¬ 
serves computational differential privacy if it is executed through a secure PSA scheme. 
Moreover, we compared the accuracy of the Geometric, the Binomial and the Skellam 
mechanisms which preserve differential privacy and are suitable for an execution through 
a PSA scheme. While the practical performances of the Geometric and the Skellam mech¬ 
anisms are equally better than the performance of the Binomial mechanism, we were able 
to provide a slightly better bound for the Skellam mechanism at high privacy levels. 

A Proof of Theorem [2] 

Let game 1 be the security game from Definition [5] instantiated for the PSA scheme of 
Theorem [2] We need to show that the advantage Hi(k) of a probabilistic polynomial¬ 
time adversary 71 in winning this game is negligible in the security parameter k. We 
define the following intermediate game 2 for a probabilistic polynomial-time adversary 
72 and then show that winning game 1 is at least as hard as winning game 2. 


Setup. The challenger runs the Setup algorithm on input security parameter n and 
returns public parameters pp, time-steps T and secret keys s, S\, .. ., s n with s = 
(*"= 1 s i )^ 1 . It sends k, pp, T, s to 72- 

Queries. The challenger flips a random bit b e# {0,1}. 7-2 chooses U = {i \,..., *„} <= 
[n] and sends it to the challenger which returns (sAgjruyy. 7-j is allowed to query 
(*,f, Xi) with i e U,t e T, aq e V and the challenger returns the following: if b = 0 
it sends F Si (t) • (p(xi) to if b = 1 it chooses 

hi ,t,.. ■, hu—i,t £r G , 

U / U— 1 

h u ,t '■= ]^[ F Sij . ( t ) • I hj t t 
j =i \i =i 



and sends /iqt • ip(xi) to 72- 

Challenge. 72 chooses t* e T such that no encryption query at t* was made and 
queries a tuple ( Xi)i e u■ If b = 0 the challenger sends (F Si (t*) ■ tp(xi))i e u to T 2 ] if 
b = 1 it chooses 


hi,t* i • ■ •) h u ~i : t* Sr G ', 

U / U— 1 

K , t * ;= n f Si j (^*)' (n h xt* 

.7 = 1 \j= 1 

and sends ■ <p(xi))ieu to 72- 

Queries. T 2 is allowed to make the same type of queries as before with the restriction 
that no encryption query at t* can be made. 



Guess. T 2 outputs a guess about b. 


The adversary wins the game if it correctly guesses b. 


Lemma 9. Let k be a security parameter. Let T\ be an adversary in game 1 with 
advantage ft i(k) > neg(ft). Then there exists an adversary T 2 in game 2 with advantage 
M 2 («) > neg(«). 

Proof. Given a successful adversary 71 in game 1 we construct a successful adversary 
T 2 in game 2 as follows: 

Setup. Receive k. pp , T, s from the game 2-challenger and send it to 71. 

Queries. Flip a random bit b Gr {0,1}. Receive U = {i±, ...,*„} c= [n] from T\ and 
send it to the challenger. Forward the obtained response (si)j e [ n ]\i/ to 71. Forward 
71’s queries ( i,t,Xi ) with i e U, t e T, x* e V to the challenger and forward the 
obtained response cy t to 71- 

Challenge. 7i chooses t* e T such that no encryption query at i* was made and 
queries two different tuples (x l ° ] )i€U,(x\ 1] )i EU with Q uer y 

(x[ b ^)i € u to the challenger. Obtain the response ( c* f t*)ieU and forward it to 71- 

Queries. 71 can make the same type of queries as before with the restriction that no 
encryption query at t* can be made. 

Guess. 7i gives a guess about b. If the guess is correct, then output 0; if not, output 

1. 

If 7i has output the correct guess about b then T 2 can say with high confidence that the 
challenge ciphertexts were generated using a weak PRF and therefore outputs 0. On the 
other hand, if 71’s guess was not correct, then 72 can say with high confidence that the 
challenge ciphertexts were generated using random values and it outputs 1. Formally: 

Case 1. Let {c i t * = (F Si (t*) ■ g>(x\ b ^))i E u■ Then T 2 perfectly simulates game 1 for 
71 and the distribution of the ciphertexts is the same as in game 1: 

Pr[72 outputs 0] =-(Pr[7i outputs 0 | b = 0] + Pr[7I outputs 11 b = 1]) 

= Pr[7I wins game 1] 

1 / s 

= 2 + 


Case 2. Let ■ <p(x^))i e u■ Then the ciphertexts are random with the 

constraint 

E[ c M* = El ' T’K- 61 ) 

teU i€U 



such that decryption yields the same sum as in case 1. Because of the perfect security 
of the one-time pad the probability that 71 wins game 1 is 1/2 and 

Pr[75 outputs 1] =-(Pr[7I outputs 11 b = 0] + Pr[7i outputs 0 | b = 1]) 

= Pr[7i loses game 1] 

_1 
_ 2 

Finally we obtain that the advantage of 75 in winning game 2 is 

h 2 («) = gMiW > neg(«;). 

□ 

For a probabilistic polynomial-time adversary 75, we define a new intermediate game 3 
out of game 2 by just cancelling the plaintext dependence in each step of game 2, i.e. 
in the encryption queries and in the challenge, instead of ( i,t,Xi ) the adversary 75 now 
just queries (i,t) and the challenger in game 3 sends 

F Si (t), if b = 0, 
hi >t , if b = 1 

to the adversary 75- The rest remains the same as in game 2. 

It is easy to see that if there exists a successful adversary in game 2 then there is also 
a successful adversary in game 3. 

Lemma 10. Let k be a security parameter. Let 75 be an adversary in game 2 with 
advantage /^(k) > neg(/i). Then there exists an adversary 75 in game 3 with advantage 
M 3 (k) > neg(«). 

Remark 2. For comparison to the proof of adaptive security by Shi et al. m we 
emphasise that in the reduction from Aggregator Obliviousness to an intermediate problem 
(Proof of Theorem 1 in [27]/ an adversary B has to compute the ciphertexts Ci = g Xi H{t) Si 
for all users i e [ n ] and for all (!) time-steps t, since B does not know in advance for 
which i e [n] it will have to use the PRF H{t) Si and for which i e [n] it will have to use 
real random values. Thus, B has to program the random oracle H in order to know for 
all t the corresponding random number z with H{t) = g z (where g is a generator) for 
simulating the original Aggregator Obliviousness game. In contrast, in the reduction for 
our non-adaptive version of Aggregator Obliviousness, it is not necessary to program such 
an oracle, since the simulating adversary 75 knows in advance the set of non-compromised 
users and, for all (!) t, it can already decide for which i e [n] it will use the PRF (which 
in our case is t Si instead of H(t) Si ) and for which i e [n] it will use a real random value. 

In the next step, the problem of distinguishing the weak PRF family 

F = {F s : M —> G'} se s 

from a random function family has to be reduced to the problem of winning game 3. 
We use a hybrid argument. 


Lemma 11. Let k be a security parameter. Let 75 be an adversary in game 3 with 
advantage p^( k). Then /j. 3 (k) < neg(ft) if 

F={F s \F a :M^G'} seS 


is a weak PRF family. 

Proof. We define the following sequence of hybrid games, game 3; with l = 1,..., u — 1, 
for a probabilistic polynomial-time adversary 73 ,. 

Setup. As in game 3. 

Queries. The challenger flips a random bit b Gr {0,1}. 73 , chooses U = {i\, ..., i u } c 
[n] and sends it to the challenger which returns 75, is allowed to query 

(i, t) with * e U,t e T and the challenger returns the following: if i ^ {i%,..., ii+b} 
it sends F Si (<) to 75,; if i e {ii,... ,ii+b} it chooses 

hi,t, • • •, hi_(i-b),t b r 

l+b /i-(l-b) \ _1 

n h 3 <t j 

J =1 V j=1 J 


and sends h^t to 75, ■ 

Challenge. 75, chooses t* e T such that no encryption query at t* was made. The 
challenger chooses 


hi t* 1 ■ ■ ■ 1 G , 

i+b A-(i-fc) 

hi +b , t * : = n F *q (**) • n %** 

3 =1 V j=i 

and sends the following sequence to 75,: 

Queries. 75, can make the same type of queries as before with the restriction that no 
encryption query at t* can be made. 

Guess. 75, outputs a guess about b. 

The adversary wins the game if it correctly guesses b. 

It is easy to see that game 3i with 6 = 0 corresponds to the case b = 0 in game 3 
and game 3 u _i with 6=1 corresponds to the case 6 = 1 in game 3. Moreover the 
ciphertexts in game 3; with 6=1 have the same distribution as the ciphertexts in game 
3; + i with 6 = 0. Therefore 



Pr[75, + i wins g ame | 6 = 0 ] = Pr[75, loses game 3; | 6 = 1 ], 



Using a successful adversary 73, in game 3; we construct a successful probabilistic 
polynomial-time distinguisher 2?prf which has access to an oracle 


£>(•) e R {F s -(-), rand(-)}, where 


is a weak PRF and 


F s / : M -> G' 


rand : M G' 


is a real random function. 7 ?prf gets n as input and proceeds as follows. 

f. Choose two indices k\, e [n] and guess that k\, k% will be the if 1 , indices in 
U specified by the adversary 75,. This guess will be correct with probability 1/n 2 . 

2. Choose s e R S,Si e R S for all i e [n]\{fci, k^}, generate pp and T with t e R M for 
all t e T. Compute F s (t) for all t e T. 

3. Make oracle queries for t and receive 0(t) for all t e T. 

4. Send k, pp, T, s to 75,. 

5. Queries. Receive V = £ [n] from 75,- If k / k\ or k + \ / then 

abort. Else send (sj)i e r n ]u/ to 75,- If 75, queries (i,t) with i e U,t e T then return 
the following: if i $ {ii ,..., ii+{\ send F Sj (t) to 75,; if i = ii+\ = ik 2 sen( l O(t) to 
75,; if i e {*i,..., it } choose 


h\ t t, • ■ ■, hi- i,t £/j G ', 


z-i 


:= ( F a (t) • O(t) • n • n F ai (t) 

j = l ie[n]\{ii,...,i i + i} 


and send hi it to 75, • 

6. Challenge. 75, chooses t* e T such that no encryption query at t* was made. 
Choose 


hi t* i * • * ? hi—i t* e R G', 


i -i 


/»!,** := F 8 (t*) • 0(0 • n • n F Si (t*) 

\ t = 1 ie[ra]\{ii,...,j, + i} 

and send the following sequence to 75,: 


7. Queries. 75, can make the same type of queries as before with the restriction that 
no encryption query at t* can be made. 



8. Guess. 75, outputs a guess about whether the if_) 1 element is random or pseudo¬ 
random. Output the same guessd 


If 7^, has output the correct guess about whether the element is random or pseudo¬ 
random then I?prf can distinguish between F s /(-) and rand(-). Now we prove this result 
formally and show that, in this way, game 3; is perfectly simulated by 75,. 

Case 1. Let O(-) = F s /(-). Define Sj, +1 := s'. Since S,M are groups, there exists an 
element Si, with 

Si, = (s* * s^ 1 


and for allt e T: 


(F a (t) - F^(t). n F fll (t)j =nF S .(f). 

\ ie[n]\{ii,...,i i + i} J j = 1 

Then for all t e T the value hij is equal to 

( f a (t) ■ n h jtt ■ f At) ■ n f s m\ = n F Sii w • ( fj ^ . 

V 3= 1 .i|+il J 3= 1 V7 = l / 

Therefore the distribution of the ciphertexts corresponds exactly to the case in game 3; 
with 6 = 0. 


Case 2. Let O(-) = rand(-). Define the random elements 


hi+i, t ■= rand(i) 


for all t e T. Since S, M are groups, there exists an element s' e S with 


s' = {s* * Si) 1 


Let Si, £/j S and Sj, +1 := s' * s i 1 . Then for all f e T: 


-l 


l +1 


F s(t) 


[] F,(/.) - ] | F„.(z) 


ie[n]\{ii,.,.,!i + i) 


3 =1 


and the value h^ t is equal to 


-l 


F s (t) ■ hi+i t fY[hj,f ]^[ F Si (t) 

j— 1 2G[n]\{u,...,2j + i} 


3 Essentially, here the specification of the set of non-compromised users before making any query 
allows T >prf be consistent with pseudo-random values or real random values in its replies to the 
queries. 



and equivalently 


-l 


i +1 / i \ 

w=ri F -«i(*)- 

3 = 1 \j=t ) 

Therefore the distribution of the ciphertexts corresponds exactly to the case in game 3/ 
with 6=1. 

Without loss of generality, let 

Pr[73, wins game 3; | b = 0] > Pr[73, loses game 3; | b = 1], 

All in all, we obtain 

Pr[73j wins game 3^ | 6 = 0] — Pr[73 ; loses game 3/1 6 = 1] 

= PrfDp^ (k) = 11 ii = fa, ii+i = k 2 \ - PrfDp^'^fc) = 11 ii = fa, i l+1 = fa] 

^n 2 • (Pr[D^(«) = 1] - Pr[T>p a " d F ( ) («) = 1]) 

and since n is polynomial in k, this expression is negligible by the pseudo-randomness of 
F s /(-) on uniformly chosen input. Therefore, the advantage of l 3l in winning game 3; is 
negligible. 

Finally, by a hybrid argument we have: 

Pr[73 wins game 3] 

= — (Pr[75 wins game 3 | b = 0] + Pr[73 wins game 3 | b = 1]) 

= ^(Pr[7ii wins game 3i | 6 = 0] + Pr[73„_! wins game 3„_i | 6 = 1]) 

= - + ^(Pr[73! wins game 3i | 6 = 0] - Pr[73„_! loses game 3 u _i | b = 1]) 

1 1 u ~ 1 

= o + 9 Xi wins s ame 3/ I b = 0] - Pr[7^ loses game 3j | 6 = 1] 

Z i=i 

= 2 + ne g(«0- 

□ 


We can now complete the proof of Theorem [2j 
Proof of Theorem [2J By Lemma [U - HU 

= 2 • fi 2 (k) = 2 • fi 3 (k) = 2 • (u — 1) • n 2 ■ neg(«) <2 ■ n 3 ■ neg(«) = neg(/{). □ 

B The Skellam mechanism 

B.l Preliminaries 

As observed before, the distributed noise generation is feasible with a probability dis¬ 
tribution function closed under convolution. For this purpose, we recall the Skellam 
distribution. 


Definition 9 (Skellam Distribution [23). Let fix, P 2 > 0. A discrete random variable 
X is drawn according to the Skellam distribution with parameters p\,p -2 (short: X ~ 
Sk(/xi, H 2 )) if it has the following probability distribution function : Z >—> R: 

/ u \ ^/2 

Vv uti 3 (k) = e -(M1+M2) Ik{ 2VM1M2), 

where Ik is the modified Bessel function of the first kind (see pages 374-378 in m- 

A random variable X ~ Sk(/ii,/Z 2 ) has variance /xi + p 2 and can be generated as 
the difference of two random variables drawn according to the Poisson distribution of 
mean /ii and p> 2 , respectively [23 ■ Note that the Skellam distribution is not generally 
symmetric. However, we mainly consider the particular case /ii = P 2 = /i/2 and refer to 
this symmetric distribution as Sk(/i) = Sk(/x/2, /i/2). 

Lemma 12 ([23)- Let X ~ Sk(/ii,/i 2 ) and Y ~ Sk(/t 3 ,/i 4 ) 6e independent random 
variables. Then Z := X + Y is distributed according to Sk(/ii + /i 3 , /i 2 + /m). 

An induction step shows that the sum of n i.i.d. symmetric Skellam random variables 
with variance /i is a symmetric Skellam random variable with variance n/i. Suppose that 
adding symmetric Skellam noise with variance /i preserves (e, <5)-DP. Recall that the 
network is given an a priori known estimate 7 of the lower bound on the fraction of non- 
compromised users. We define /i user = h/{l n ) an d instruct the users to add symmetric 
Skellam noise with variance /J. user to their own data. If compromised users will not add 
noise, the total noise will be still sufficient to preserve (e, <5)-DP. 

For our analysis, we will use the following bound on the ratio of modified Bessel functions 
of the first kind. 

Lemma 13 ( 117) ). For real k > 0 let Ik(h) be the modified Bessel function of the first 
kind and order k. Then 


h(fj) < _ h 

h+i(p) —(k+l) + ^(k+ l ) 2 + /i 2 

For the privacy analysis of the Skellam mechanism, we need a tail bound on the 
symmetric Skellam distribution. 

Lemma 14. Let X ~ Sk(/t) and let c r > 0. Then, for all r ^ — up, 

Pr[A > aH + t] 

g—£t(l—Vl + <J 2 +a ln(<T+Vl+ ct 2 ))—r ln(cr+Vl+o' 2 ) 

Proof. We use standard techniques from probability theory. Applying Markov’s inequal¬ 
ity, for any t > 0 , 

Pr[A > cr/i + r] = Pr[e tx > e t(<T ' i+r) ] 

^ E[e tA ] 

gi(tr/i+r) ' 










As shown in for X ~ Sk(/r), the moment generating function of X is 

E[e 4A ] = e -M(i-co B h(t)) ) 

where cosh(f) = (e 4 + e _4 )/2. Hence, we have 

Pl[X > CT/-J + r] < e -M(l-cosh(t)+ia)-tr. 

Fix t = ln(cr + Vl + o’ 2 )- In order to conclude the proof, we observe that cosh(ln(er + 
V'T + O’ 2 )) = Vl + O' 2 . □ 

One can easily verify that, for a > 0, 

1 — Vl + cr 2 + a ln(cr + Vl + c 2 ) > 0. 

B.2 Analysis of the Skellam mechanism 

In this section, we provide a bound on the variance /i of the symmetric Skellam dis¬ 
tribution (as stated in Theorem Q that is needed in order to preserve (e, ^-differential 
privacy and we compute the error that is thus introduced. 


Privacy analysis 

Theorem 15. Let e > 0 and let 0 < 6 < 1. For all databases D e T> n the randomised 
mechanism 

Ask(D) := f(D) + Y 

preserves (e, <5)-DP with respect to any query f of sensitivity S(f), where Y ~ Sk(/x) with 

> _ log(V<?) _ 

^ 1 - cosh(e/£>(/)) + (t/S(f)) ■ sinh (e/S(f))' 

Proof. Let Ho, -Di e V n be adjacent databases with |/(Do) — f{D \)| < S(/). The largest 
ratio between Pr[Asfc(Ho) = R] and Pr[Asfc(Di) = Ft] is reached when k := R-f(Do) = 
R — f(D i) — S(f) S 5 0, where R is any possible output of Ask- Then, by Lemma fl3l for 
all possible outputs R of Ask- 

PrlAsVA)) = R] Pr \Y = k] 

Pr[A S fc(^i) = R] P r[Y = k + S(f)] 

n Pr[r = k + j- 1] 

H Pr[F = k + j] 

S(/) 

<n- 

}=i -V + j) + VV + V 2 + V 

<e e . 


( 5 ) 













Inequality ([5]) holds if k < sinh(e/S(/)) • p — S(f), since it implies k < sinh(e/5(/)) ■ p—j 
for all j = 1,..., S(f) and 


_ 11 <g e t/S(f). 

-(k + j) + V( fc + j ) 2 + M 2 " 

Applying Lemma ITT1 with a = sinh(e/5(/)) and r = —S(f), we get 

Pr[fc > sinh(e/5(/)) • p - S(f)\ 

^g-M'( 1 -' : osh(e/S(/)) + (e/S(/))-sinh(e/S(/))) + e 

and this expression is set to be smaller or equal than <5. This inequality is satisfied if 

>_ 1 og( 1 /^) _ 

" 1 - cosh(e/5(/)) + (e/S(f)) ■ sinh (e/S(f))' 

□ 

Remark 3. The bound on p from Theorem\T5\ is smaller than 2 • ( S(f)/e) 2 • log(l/5), 
thus the standard deviation ofY ~ Sk(/i) is linear in S(f)/e (for constant S). 


Accuracy analysis 


Theorem 16. Let e > 0 and 0 < S < 1. Then for all 0 < /3 < 1 the mechanism specified 
in Theorem M has (a, ft)-accuracy, where 


a = 


S(f) 




Proof. Let p = 1 _ cosh ( e/ g (/) )+(. / g( / )). sinh(e/S ( / ) ) be the bound on the variance for the 
Skellam mechanism provided in Theorem 1151 Now, as in the proof of Lemma 1141 for 
ol > 0 , 


Pr[|JV| > a'] =2 • Pr[A > a'] 

^2 • e -M-(l- c °s h (e/S(/)))-(e/S(/))-a:' 

and this expression is set to be equal to (3. Solving this equality for a' yields 

S(f) 


a =- 


< 


e 

S(f) 


(i° g (!M°osh (^) -1).„ 


lo § (I j + lo § 


=a. 


□ 










For the distributed noise generation, each single user adds symmetric Skellam noise with 
variance p, U ser = /V(7 n ) t° her data. The worst case for accuracy is when all n users 
add noise, thus the total noise N is a symmetric Skellam variable with variance /r /7 and 
the accuracy becomes 



proving the third claim of Theorem [5] 
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